An empirical resource for discovering cognitive principles of discourse organisation: the ANNODIS corpus

نویسندگان

  • Stergos D. Afantenos
  • Nicholas Asher
  • Farah Benamara
  • Myriam Bras
  • Cécile Fabre
  • Mai Ho-Dac
  • Anne Le Draoulec
  • Philippe Muller
  • Marie-Paule Péry-Woodley
  • Laurent Prévot
  • Josette Rebeyrolle
  • Ludovic Tanguy
  • Marianne Vergez-Couret
  • Laure Vieu
چکیده

We describe the Annodis corpus of discourse structures for French. The corpus joins two perspectives on discourse on a variety of textual genres: a bottom-up approach and a top-down approach. The bottom-up view builds incrementally a structure from elementary discourse units, while the top-down view focuses on the selective annotation of multi-level discourse structures. The corpus is composed of texts that are diversified with respect to genre, length and type of discursive organisation. The methodology we followed involved an iterative design of annotation guidelines in order to reach satisfactory inter-annotator agreement levels. This allows us to raise a few issues relevant for the comparison of such complex objects as discourse structures. The corpus is seen as a source of empirical evidence for discourse theories, and we present the first analyses using the annotations: for instance testing hypotheses on constraints governing the structure of discourse, studying special constructs such as enumerations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus Annotation of Macro Discourse Structures

We present our discourse annotation project, ANNODIS, which aims to make available a diversified French corpus annotated with discourse information, along with a set of tools for annotation and corpus exploitation. An original aspect of the project is that it combines two theoretically and methodologically different points of view on discourse: bottom-up and top-down. In the bottom-up perspecti...

متن کامل

Unsupervised extraction of semantic relations using discourse cues

This paper presents a knowledge base containing triples involving pairs of verbs associated with semantic or discourse relations. The relations in these triples are marked by discourse connectors between two adjacent instances of the verbs in the triple in the large French corpus, frWaC. We detail several measures that evaluate the relevance of the triples and the strength of their association....

متن کامل

Unsupervised extraction of semantic relations (Extraction non supervisée de relations sémantiques lexicales) [in French]

This paper presents a knowledge base containing triples involving pairs of verbs associated with semantic or discourse relations. The relations in these triples are marked by discourse connectors between two adjacent instances of the verbs in the triple in the large French corpus, frWaC. We detail several measures that evaluate the relevance of the triples and the strength of their association....

متن کامل

Exploiting naive vs expert discourse annotations: an experiment using lexical cohesion to predict Elaboration / Entity-Elaboration confusions

This paper brings a contribution to the field of discourse annotation of corpora. Using ANNODIS, a french corpus annotated with discourse relations by naive and expert annotators, we focus on two of them, Elaboration and Entity-Elaboration. These two very frequent relations are (a) often confused by naive annotators (b) difficult to detect automatically as their signalling is poorly studied. We...

متن کامل

Automatically identifying implicit discourse relations using annotated data and raw corpora (Identification automatique des relations discursives « implicites » à partir de données annotées et de corpus bruts) [in French]

Automatically identifying implicit discourse relations using annotated data and raw corpora This paper presents a system for identifying « implicit » discourse relations (that is, relations that are not marked by a discourse connective). Given the little amount of available annotated data for this task, our system also resorts to additional automatically labeled data wherein unambiguous connect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012